Information Retrieval Based on Extraction of Domain Specific Significant Keywords and Other Relevant Phrases from a Conceptual Semantic Network Structure

نویسندگان

  • Mohammad Moinul Hoque
  • Prakash Poudyal
  • Paulo Quaresma
چکیده

This paper presents a functional approach towards the problem domain of Information Retrieval System built upon a narration based search text. The presented system retrieves documents from the background collection by extracting the domain specific significant keywords and other relevant phrases from a given narrative search text. The narrative search text can be a description or scenario which poses a great difficulty in the problem domain to retrieve the relevant document sets with efficiency and accuracy from the background data repository. We have adopted a different approach where the significant keywords are extracted from the narration text to form a search query and alternative sets of queries are also formulated by expanding the search query from a Conceptual Semantic Network built for the purpose. Inclusion of the Conceptual Semantic Network and WordNet synonym sets for the search query expansion plays an important role in the retrieval mechanism. Experiments were carried out on the data sets from the Ad-hoc task of ‘Forum for Information Retrieval Evaluation, 2013’. The background data set contained a huge number of legal documents consisting of data over 3 GB and was divided into two domains such as ‘Consumer Law’ and ‘Hindu Marriage & Divorce Law’. For the search queries, a set of scenarios in the form of narrative text were provided. The system was required to perform an analysis of the search text and retrieve a set of top 1000 legal documents for each of the queries from the background collection which may be relevant to the situation described in the narration of the search text.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Retrieval Effectiveness of Malay Documents by Exploiting Implicit Semantic Relationship between Words

Phrases has a long history in information retrieval, particularly in commercial systems. Implicit semantic relationship between words in a form of BaseNP have shown significant improvement in term of precision in many IR studies. Our research focuses on linguistic phrases which is language dependent. Our results show that using BaseNP can improve performance although above 62% of words formatio...

متن کامل

Semiautomatic Image Retrieval Using the High Level Semantic Labels

Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...

متن کامل

In Proceedings of the Fourteenth International Joint Conference on Arti cial Intelligence ( IJCAI ' 95 ) CRYSTAL : Inducing a Conceptual Dictionary

One of the central knowledge sources of an information extraction (IE) system is a dictionary of linguistic patterns that can be used to identify references to relevant information in a text. Automatic creation of conceptual dictionaries is important for portability and scalability of an IE system. This paper describes CRYSTAL, a system which automatically induces a dictionary of \concept-node ...

متن کامل

Reverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages

Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013